Pesquisa | Portal Regional da BVS

Infrared: a declarative tree decomposition-powered framework for bioinformatics.

Yao, Hua-Ting; Marchand, Bertrand; Berkemer, Sarah J; Ponty, Yann; Will, Sebastian.

Algorithms Mol Biol ; 19(1): 13, 2024 Mar 16.

Artigo em Inglês | MEDLINE | ID: mdl-38493130

RESUMO

MOTIVATION: Many bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using Dynamic Programming (DP). However, such exact methods are typically tailored towards specific settings, complex to develop, and hard to implement and adapt to problem variations. METHODS: We introduce the Infrared framework to overcome such hindrances for a large class of problems. Its underlying paradigm is tailored toward problems that can be declaratively formalized as sparse feature networks, a generalization of constraint networks. Classic Boolean constraints specify a search space, consisting of putative solutions whose evaluation is performed through a combination of features. Problems are then solved using generic cluster tree elimination algorithms over a tree decomposition of the feature network. Their overall complexities are linear on the number of variables, and only exponential in the treewidth of the feature network. For sparse feature networks, associated with low to moderate treewidths, these algorithms allow to find optimal solutions, or generate controlled samples, with practical empirical efficiency. RESULTS: Implementing these methods, the Infrared software allows Python programmers to rapidly develop exact optimization and sampling applications based on a tree decomposition-based efficient processing. Instead of directly coding specialized algorithms, problems are declaratively modeled as sets of variables over finite domains, whose dependencies are captured by constraints and functions. Such models are then automatically solved by generic DP algorithms. To illustrate the applicability of Infrared in bioinformatics and guide new users, we model and discuss variants of bioinformatics applications. We provide reimplementations and extensions of methods for RNA design, RNA sequence-structure alignment, parsimony-driven inference of ancestral traits in phylogenetic trees/networks, and design of coding sequences. Moreover, we demonstrate multidimensional Boltzmann sampling. These applications of the framework-together with our novel results-underline the practical relevance of Infrared. Remarkably, the achieved complexities are typically equivalent to the ones of specialized algorithms and implementations. AVAILABILITY: Infrared is available at https://amibio.gitlabpages.inria.fr/Infrared with extensive documentation, including various usage examples and API reference; it can be installed using Conda or from source.

Mono-valent salt corrections for RNA secondary structures in the ViennaRNA package.

Yao, Hua-Ting; Lorenz, Ronny; Hofacker, Ivo L; Stadler, Peter F.

Algorithms Mol Biol ; 18(1): 8, 2023 Jul 29.

Artigo em Inglês | MEDLINE | ID: mdl-37516881

RESUMO

BACKGROUND: RNA features a highly negatively charged phosphate backbone that attracts a cloud of counter-ions that reduce the electrostatic repulsion in a concentration dependent manner. Ion concentrations thus have a large influence on folding and stability of RNA structures. Despite their well-documented effects, salt effects are not handled consistently by currently available secondary structure prediction algorithms. Combining Debye-Hückel potentials for line charges and Manning's counter-ion condensation theory, Einert et al. (Biophys J 100: 2745-2753, 2011) modeled the energetic contributions of monovalent cations on loops and helices. RESULTS: The model of Einert et al. is adapted to match the structure of the dynamic programming recursion of RNA secondary structure prediction algorithms. An empirical term describing the salt dependence of the duplex initiation energy is added to improve co-folding predictions for two or more RNA strands. The slightly modified model is implemented in the ViennaRNA package in such way that only the energy parameters but not the algorithmic structure is affected. A comparison with data from the literature show that predicted free energies and melting temperatures are in reasonable agreement with experiments. CONCLUSION: The new feature in the ViennaRNA package makes it possible to study effects of salt concentrations on RNA folding in a systematic manner. Strictly speaking, the model pertains only to mono-valent cations, and thus covers the most important parameter, i.e., the NaCl concentration. It remains a question for future research to what extent unspecific effects of bi- and tri-valent cations can be approximated in a similar manner. AVAILABILITY: Corrections for the concentration of monovalent cations are available in the ViennaRNA package starting from version 2.6.0.

Advanced Design of Structural RNAs Using RNARedPrint.

Ponty, Yann; Hammer, Stefan; Yao, Hua-Ting; Will, Sebastian.

Methods Mol Biol ; 2284: 1-15, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33835434

RESUMO

RNA design addresses the need to build novel RNAs, e.g., for biotechnological applications in synthetic biology, equipped with desired functional properties. This chapter describes how to use the software RNARedPrint for the de novo rational design of RNA sequences adopting one or several desired secondary structures. Depending on the application, these structures could represent alternate configurations or kinetic pathways. The software makes such design convenient and sufficiently fast for practical routine, where it even overcomes notorious problems in the application of RNA design, e.g., it maintains realistic GC content.

Assuntos

RNA/síntese química , Software , Biologia Sintética/métodos , Algoritmos , Animais , Composição de Bases , Sequência de Bases , Humanos , Conformação de Ácido Nucleico , RNA/química , Riboswitch/fisiologia , Interface Usuário-Computador

MentaLiST - A fast MLST caller for large MLST schemes.

Feijao, Pedro; Yao, Hua-Ting; Fornika, Dan; Gardy, Jennifer; Hsiao, William; Chauve, Cedric; Chindelevitch, Leonid.

Microb Genom ; 4(2)2018 02.

Artigo em Inglês | MEDLINE | ID: mdl-29319471

RESUMO

MLST (multi-locus sequence typing) is a classic technique for genotyping bacteria, widely applied for pathogen outbreak surveillance. Traditionally, MLST is based on identifying sequence types from a small number of housekeeping genes. With the increasing availability of whole-genome sequencing data, MLST methods have evolved towards larger typing schemes, based on a few hundred genes [core genome MLST (cgMLST)] to a few thousand genes [whole genome MLST (wgMLST)]. Such large-scale MLST schemes have been shown to provide a finer resolution and are increasingly used in various contexts such as hospital outbreaks or foodborne pathogen outbreaks. This methodological shift raises new computational challenges, especially given the large size of the schemes involved. Very few available MLST callers are currently capable of dealing with large MLST schemes. We introduce MentaLiST, a new MLST caller, based on a k-mer voting algorithm and written in the Julia language, specifically designed and implemented to handle large typing schemes. We test it on real and simulated data to show that MentaLiST is faster than any other available MLST caller while providing the same or better accuracy, and is capable of dealing with MLST schemes with up to thousands of genes while requiring limited computational resources. MentaLiST source code and easy installation instructions using a Conda package are available at https://github.com/WGS-TB/MentaLiST.

Assuntos

Bactérias/genética , Técnicas de Tipagem Bacteriana/métodos , Tipagem de Sequências Multilocus/instrumentação , Tipagem de Sequências Multilocus/métodos , Bactérias/classificação , Bactérias/isolamento & purificação , Surtos de Doenças , Enterococcus faecium/genética , Monitoramento Epidemiológico , Doenças Transmitidas por Alimentos/microbiologia , Genes Essenciais , Genoma Bacteriano , Humanos , Epidemiologia Molecular/métodos , Mycobacterium tuberculosis/genética , Salmonella/genética , Software , Fatores de Tempo , Sequenciamento Completo do Genoma

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA